Feat(e2e): support multiple aggregators in the e2e tests #2378

jpraynaud · 2025-03-18T16:25:27Z

Content

This PR includes the adaptation of the e2e tests to support multiple aggregators:

The relay has been updated to support both Passthrough (messages are sent to the configured aggregator endpoint) and P2P (messages are sent to the P2P network) modes for both the signer registration and signature registration. The configuration options have been updated in that sense
The end to end test configuration has evolved:
- number_of_aggregators and number_of_signers are specified instead of number_of_pool_nodes
- use_p2pmode has been replaced by more appropriate use_relays
- relay_signer_registration_mode and relay-signature_registration_mode have been added (used with the use_relays option)
RunOnly mode of the e2e test has been adapted to support concurrently multiple aggregators
Spec mode of the e2e test has been adapted to support concurrently multiple aggregators
Slave registration of the aggregator has been fixed as it was not targeting the correct verification keys. The associated integration test has been rewritten for finer testing of evolving Mithril stake distribution for each epoch.

Pre-submit checklist

Branch
- Tests are provided (if possible)
- Crates versions are updated (if relevant)
- CHANGELOG file is updated (if relevant)
- Commit sequence broadly makes sense
- Key commits have useful messages
PR
- All check jobs of the CI have succeeded
- Self-reviewed the diff
- Useful pull request description
- Reviewer requested
Documentation
- No new TODOs introduced

Issue(s)

Closes #2361

github-actions · 2025-03-18T16:33:32Z

Test Results

3 files ±0 57 suites ±0 11m 45s ⏱️ +16s
1 776 tests +2 1 776 ✅ +2 0 💤 ±0 0 ❌ ±0
2 174 runs +2 2 174 ✅ +2 0 💤 ±0 0 ❌ ±0

Results for commit b48abad. ± Comparison against base commit 12cd9c8.

♻️ This comment has been updated with latest results.

sfauvel

LGTM
Just some remarks

sfauvel · 2025-03-21T08:47:11Z

mithril-aggregator/src/runtime/state_machine.rs

@@ -940,7 +940,7 @@ mod tests {
            runner
                .expect_inform_new_epoch()
                .with(predicate::eq(new_time_point_clone.clone().epoch))
-                .once()
+                .times(2)


Why we need to change number of calls ?
Modification on the state_machine seems to concern only the slave mode.
Does it mean we are running a slave ?
Test name say that it is a master: "idle_new_epoch_detected_and_master_has_transitioned_to_epoch"

mithril-aggregator/tests/create_certificate_slave.rs

mithril-relay/src/relay/signer.rs

sfauvel · 2025-03-21T10:09:45Z

mithril-relay/src/relay/signer.rs

+                    ))),
+                }
+            }
+            SignerRelayMode::Passthrough => {


There is no test in this file about this code.
Are tests useless here?
Is it tested elsewhere ?

mithril-test-lab/mithril-end-to-end/src/mithril/infrastructure.rs

dlachaume

LGTM

mithril-test-lab/mithril-end-to-end/src/main.rs

mithril-common/src/entities/epoch.rs

mithril-relay/src/repeater.rs

mithril-test-lab/mithril-end-to-end/src/mithril/infrastructure.rs

dlachaume · 2025-03-21T11:04:48Z

mithril-test-lab/mithril-end-to-end/src/assertions/exec.rs

+    // This should be removed when the aggregator is able to synchronize its certificate chain from another aggregator
+    if !aggregator.is_first() {
+        tokio::time::sleep(std::time::Duration::from_millis(
+            5 * aggregator.mithril_run_interval() as u64,


Maybe extracting the hardcoded value 5 into a named constant would improve readability and maintability?

Alenar

LGTM with a few caveats.

mithril-aggregator/tests/create_certificate_slave.rs

mithril-test-lab/mithril-end-to-end/Cargo.toml

Alenar · 2025-03-21T11:43:17Z

mithril-aggregator/src/runtime/state_machine.rs

            self.runner.update_epoch_settings().await?;
+            if self.config.is_slave {
+                self.runner
+                    .synchronize_slave_aggregator_signer_registration()
+                    .await?;
+                // Needed to recompute epoch data for the next signing round on the slave
+                self.runner.inform_new_epoch(new_time_point.epoch).await?;
+            }


Can you explain how this change help to stabilize the e2e tests ? I'm quite puzzled over the fact that we need to call runner.inform_new_epoch twice.

From what I understand this doesn't impact the methods called between the inform_new_epoch calls:

runner.upkeep call should not be impacted

open_signer_registration_round do nothing on slave

update_epoch_settings should not be impacted as the data registered by the epoch service (protocol parameters and transactions signing config) don't depends on the master aggregator

The functional impacts should be:

epoch service will expose an incorrect list of next_signers in the interval between the two inform_new_epoch calls

epoch service will be ready earlier since a first inform_epoch calls will be done without needing a roundtrip to the master aggregator

Is the last point the problem on fast network ? Maybe the synchronizer should be able to "edit" the next signers in the epoch_service instead ?

mithril-test-lab/mithril-end-to-end/src/assertions/wait.rs

mithril-test-lab/mithril-end-to-end/src/mithril/infrastructure.rs

mithril-test-lab/mithril-end-to-end/src/end_to_end_spec.rs

First aggregator is 'master', and others (if any) are 'slave' to the 'master'.

…ly' and 'Spec'

…ggregators Better P2P relays topology and fix log files collisions.

By providing information about the targeted aggregator in logs and errors.

…ke distribution

Which could prevent signature from signers even with loose protocol parameters.

Which can be 'Passthrough' or 'P2P'.

As master/slave signer registration is only one of the configurations to be tested.

Until we can fix the source of flakiness.

- Removed last epoch which was not necessary - Removed unnecessary cycles - Reduced the number of signers per epoch - Use of 'checked_sub' in the 'EpochFixturesMapBuilder'.

…to HTTP response in signer

jpraynaud self-assigned this Mar 18, 2025

jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch from 99077bb to 9ad40fc Compare March 18, 2025 16:50

jpraynaud temporarily deployed to testing-preview March 18, 2025 17:01 — with GitHub Actions Inactive

jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch 5 times, most recently from cf5d195 to 7c6a300 Compare March 19, 2025 17:47

jpraynaud temporarily deployed to testing-preview March 19, 2025 17:58 — with GitHub Actions Inactive

jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch 5 times, most recently from de78895 to 42b4d01 Compare March 20, 2025 18:34

jpraynaud marked this pull request as ready for review March 20, 2025 18:35

jpraynaud requested review from Alenar, sfauvel and dlachaume March 20, 2025 18:35

sfauvel approved these changes Mar 21, 2025

View reviewed changes

dlachaume approved these changes Mar 21, 2025

View reviewed changes

Alenar approved these changes Mar 21, 2025

View reviewed changes

jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch 2 times, most recently from b464797 to 9e47514 Compare March 21, 2025 17:55

Alenar force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch from 9e47514 to be638e5 Compare March 24, 2025 12:19

jpraynaud added 5 commits March 24, 2025 18:17

feat(e2e): update e2e test command line arguments

a9d51f6

feat(e2e): 'MithrilInfrastructure' supports multiple aggregators

6eddc27

First aggregator is 'master', and others (if any) are 'slave' to the 'master'.

refactor(e2e): implement interior mutability for 'Aggregator', 'RunOn…

b917665

…ly' and 'Spec'

refactor(relay): support for dialing to peer in relays of e2e test

f92420c

refactor(e2e): enhance 'MithrilInfrastructure' support for multiple a…

2d948a0

…ggregators Better P2P relays topology and fix log files collisions.

jpraynaud and others added 25 commits March 24, 2025 18:17

feat(e2e): 'RunOnly' supports multiple aggregators

a6ffd0f

feat(e2e): 'Spec' supports multiple aggregators

5e1bde8

feat(e2e): runner supports multiple aggregators

62f183c

refactor(e2e): enhance naming of aggregators and associated relays

c3eda6e

refactor(e2e): enhance assertions checks

596387f

By providing information about the targeted aggregator in logs and errors.

fix(common): enhance Certificate display implementation

321c809

fix(aggregator): integration test for slave uses evolving Mithril sta…

fe2592d

…ke distribution

fix(common): avoid too low stake in random stake distribution

5cb1589

Which could prevent signature from signers even with loose protocol parameters.

fix(aggregator): slave signer registration stabilization

ddcd10e

feat(relay): implement signer relay modes

9dd961f

Which can be 'Passthrough' or 'P2P'.

refactor(e2e): use signer relay modes in e2e test

21673ab

refactor(ci): update e2e tests in CI to use the signer relay modes

ccae2ac

refactor(e2e): better naming for aggregators in e2e tests

24dac32

fix(e2e): delegate stakes only from the first aggregator

eea39c8

refactor(e2e): remove distinction master/slave aggregator

f71f6b0

As master/slave signer registration is only one of the configurations to be tested.

fix(e2e): make genesis bootstrap error retryable

bac426d

Until we can fix the source of flakiness.

fix(e2e): flakiness in the genesis bootstrap of slave aggregators

024fcf7

refactor(e2e): enhance assertions logs with aggregator name

320e695

fix(ci): wrong format for next era in some e2e scenarios

cc61c42

fix(e2e): era switch done on multiple aggregators

86f4460

refactor(aggregator): simplify slave aggregator integration test

8104405

- Removed last epoch which was not necessary - Removed unnecessary cycles - Reduced the number of signers per epoch - Use of 'checked_sub' in the 'EpochFixturesMapBuilder'.

refactor(e2e): better parameter handling with clap

8488253

chore(e2e): apply review comments

b1e0609

refactor(e2e): avoid rwlock of option in specs

39ab546

refactor(e2e): simplify chain observer usage

4c3dded

jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch from be638e5 to 3794e05 Compare March 24, 2025 17:18

jpraynaud added 3 commits March 24, 2025 18:58

refactor(e2e): remove dependency to 'mithril-relay'

922d28a

refactor(relay): create function converting mpsc transmission result …

fef7e16

…to HTTP response in signer

wip(ci): DO NOT MERGE

b48abad

jpraynaud force-pushed the jpraynaud/2361-e2e-test-slave-aggregator branch from 3794e05 to b48abad Compare March 24, 2025 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat(e2e): support multiple aggregators in the e2e tests #2378

Feat(e2e): support multiple aggregators in the e2e tests #2378

jpraynaud commented Mar 18, 2025 •

edited

Loading

github-actions bot commented Mar 18, 2025 •

edited

Loading

sfauvel left a comment

sfauvel Mar 21, 2025

sfauvel Mar 21, 2025

dlachaume left a comment

dlachaume Mar 21, 2025

Alenar left a comment

Alenar Mar 21, 2025 •

edited

Loading

Feat(e2e): support multiple aggregators in the e2e tests #2378

Are you sure you want to change the base?

Feat(e2e): support multiple aggregators in the e2e tests #2378

Conversation

jpraynaud commented Mar 18, 2025 • edited Loading

Content

Pre-submit checklist

Issue(s)

github-actions bot commented Mar 18, 2025 • edited Loading

Test Results

sfauvel left a comment

Choose a reason for hiding this comment

sfauvel Mar 21, 2025

Choose a reason for hiding this comment

sfauvel Mar 21, 2025

Choose a reason for hiding this comment

dlachaume left a comment

Choose a reason for hiding this comment

dlachaume Mar 21, 2025

Choose a reason for hiding this comment

Alenar left a comment

Choose a reason for hiding this comment

Alenar Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

jpraynaud commented Mar 18, 2025 •

edited

Loading

github-actions bot commented Mar 18, 2025 •

edited

Loading

Alenar Mar 21, 2025 •

edited

Loading